Implementing a Fast Lucas-Lehmer Test on Programmable Graphics Hardware
نویسنده
چکیده
The Lucas-Lehmer test provides a deterministic algorithm for testing whether, for a prime number p, Mp = 2−1 is also a prime number. The current work demonstrates that this test can be effectively implemented on a parallel graphics processing unit (GPU). The parallelization was achieved by two main parallel methods: (1) fast multiplication using parallel Fast Fourier transforms in extended precision; (2) fast parallel carryaddition for arbitrary-precision numbers. Extended-precision is necessary in the Fourier transforms to allow single-precision graphics hardware to achieve sufficient precision for tests on non-trivial values of Mp. Methods (1) and (2) allow data to to remain on the graphics card throughout the test and minimize runtime costs of bus traffic between the host and GPU. The algorithm has been implemented in the Cg language and tested on several hardware platforms. The current work demonstrates the viability of current and future GPUs for number theoretic computation. [Addenda (2009): While actual implementations of this were not competitive with highly optimized sequential algorithms such as those used by GIMPS, a similar implementation using modern double-precision GPU hardware and CUDA kernels, rather than Cg-shaders, might produce superior runtimes to sequential algorithms.]
منابع مشابه
Computation on GPUs: From a Programmable Pipeline to an Efficient Stream Processor
The recent development of graphics hardware is presenting a change in the implementation of the graphics pipeline, from a fixed set of functions, to userdeveloped special programs to be executed on a per-vertex or per-fragment basis. This programmability allows the efficient implementation of different algorithms directly on the graphics hardware. In this tutorial we will present the main techn...
متن کاملFrom Behavioral to RTL Design Flow in SystemC LLR– PROSILOG scientific collaboration
This paper reports the scientific collaboration between LLR and PROSILOG. The aim of this collaboration was to show the possibility to quickly implement a system into a FPGA, using SystemC as the unique description language. Starting from behavioral abstraction level, the model, before hardware synthesis, is refined down to RTL then automatically translated to the equivalent model into VHDL or ...
متن کاملImplementation of a High Throughput 3GPP Turbo Decoder on GPU
Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of the processing pow...
متن کاملSecure FPGA Design by Filling Unused Spaces
Nowadays there are different kinds of attacks on Field Programmable Gate Array (FPGA). As FPGAs are used in many different applications, its security becomes an important concern, especially in Internet of Things (IoT) applications. Hardware Trojan Horse (HTH) insertion is one of the major security threats that can be implemented in unused space of the FPGA. This unused space is unavoidable to ...
متن کاملImplementing a Programmable Pixel Pipeline in FPGAs
Complex three dimensional graphics rendering is computationally very intensive process, so even the newest microprocessors cannot handle more complicated scenes in real time. Therefore to produce realistic rendering, hardware solutions are required. This paper discusses an FPGA implementation which supports programmable pixel computing.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009